Performance Analysis of the Kahan-Enhanced Scalar Product on Current Multicore Processors

نویسندگان

  • Johannes Hofmann
  • Dietmar Fey
  • Michael Riedmann
  • Jan Eitzinger
  • Georg Hager
  • Gerhard Wellein
چکیده

We investigate the performance characteristics of a numerically enhanced scalar product (dot) kernel loop that uses the Kahan algorithm to compensate for numerical errors, and describe efficient SIMD-vectorized implementations on recent Intel processors. Using low-level instruction analysis and the execution-cache-memory (ECM) performance model we pinpoint the relevant performance bottlenecks for single-core and thread-parallel execution, and predict performance and saturation behavior. We show that the Kahan-enhanced scalar product comes at almost no additional cost compared to the naive (non-Kahan) scalar product if appropriate low-level optimizations, notably SIMD vec-torization and unrolling, are applied. We also investigate the impact of architectural changes across four generations of Intel Xeon processors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance analysis of the Kahan-enhanced scalar product on current multi- and manycore processors

SUMMARY We investigate the performance characteristics of a numerically enhanced scalar product (dot) kernel loop that uses the Kahan algorithm to compensate for numerical errors, and describe efficient SIMD-vectorized implementations on recent multi-and manycore processors. Using low-level instruction analysis and the execution-cache-memory (ECM) performance model we pinpoint the relevant perf...

متن کامل

On the accuracy and usefulness of analytic energy models for contemporary multicore processors

This paper presents refinements to the execution-cache-memory performance model and a previously published power model for multicore processors. The combination of both enables a very accurate prediction of performance and energy consumption of contemporary multicore processors as a function of relevant parameters such as number of active cores as well as core and Uncore frequencies. Model vali...

متن کامل

Microprocessor Thermal Analysis using the Finite Element Method

The microelectronics industry is pursuing many options to sustain the performance improvement expected every two years. One method for performance improvement is scaling transistor sizes down such that many more transistors can be compacted on chip. The on-chip temperature is a concern because the reliability and performance can be degraded due to hot spots. Thermal modeling of the chip will al...

متن کامل

Multicore Processors : Challenges , Opportunities , Emerging Trends

This paper undertakes a critical review of the current challenges in multicore processor evolution, underlying trends and design decisions for future multicore processor implementations. It is first shown, that for keeping up with Moore ́s law during the last decade, the VLSI scaling rules for processor design had to be dramatically changed. In future multicore designs large quantities of dark s...

متن کامل

Technical Report UPC-DAC-RR-2010-2Decomposable and Responsive Power Models for Multicore Processors using Performance Counters

Power modeling based on performance monitoring counters (PMCs) has attracted the interest of many researchers since it become a quick approach to understand and analyse power behavior on real systems. Moreover, several power aware policies use power models to guide their decisions and to trigger low-level mechanisms -e.g. manage processor frequency-. Hence, the information, the accuracy and the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015